Datasets Documentation
This documentation focuses on key datasets and utility functions used in the project.
- class encoding_information.datasets.bsccm_utils.BSCCMDataset(path)
Bases:
MeasurementDatasetBaseDataset class for BSCCM (Brightfield, Scattering, and Contrast Modulation) microscopy data.
This class provides methods to load BSCCM data, process it, and apply noise models.
- _bsccm
BSCCM object to interface with the dataset.
- Type:
BSCCM
Initialize the BSCCM dataset.
- Parameters:
path (str) – Path to the BSCCM dataset.
- __init__(path)
Initialize the BSCCM dataset.
- Parameters:
path (str) – Path to the BSCCM dataset.
- get_measurements(num_measurements, mean=None, bias=0, noise='Poisson', data_seed=None, noise_seed=None, edge_crop=24, channels='DPC_Left')
Get a set of measurements from the dataset, with optional noise and bias.
- Parameters:
num_measurements (int) – Number of measurements to retrieve.
mean (float, optional) – Mean value to scale the images by.
bias (float, optional) – Bias to add to the images (default is 0).
noise (str, optional) – Type of noise to apply (‘Poisson’ supported) (default is ‘Poisson’).
data_seed (int, optional) – Seed for random selection of images (default is None).
noise_seed (int, optional) – Seed for generating noise (default is None).
edge_crop (int, optional) – Number of pixels to crop from the edges (default is 24).
channels (str or list of str, optional) – Channels to retrieve (default is ‘DPC_Left’).
- Returns:
Array of measurements with optional noise and bias.
- Return type:
np.ndarray
- Raises:
NotImplementedError – If unsupported noise type is provided.
Exception – If the requested number of measurements exceeds available data or if a rescale fraction is invalid.
- get_shape(channels='DPC_Left', edge_crop=24)
Return the shape of the dataset images for specified channels.
- Parameters:
channels (str or list of str, optional) – Channels to include in the shape. Default is ‘DPC_Left’.
edge_crop (int, optional) – Number of pixels to crop from the edges of the images (default is 24).
- Returns:
Shape of the dataset images.
- Return type:
tuple
- class encoding_information.datasets.cfa_dataset.ColorFilterArrayDataset(zarr_path, tile_size=128)
Bases:
MeasurementDatasetBaseDataset of natural images with various Bayer-like filters applied.
This dataset is based on Shi’s re-processing of Gehler’s Raw Dataset, which consists of 568 images. The Bayer-like filters simulate a color filter array, such as the one used in digital cameras.
References:
Lilong Shi and Brian Funt, “Re-processed Version of the Gehler Color Constancy Dataset of 568 Images.”
Peter Gehler, Carsten Rother, Andrew Blake, Tom Minka, and Toby Sharp, “Bayesian Color Constancy Revisited,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.
Initialize the dataset and split the images into non-overlapping tiles.
- param zarr_path:
Path to the Zarr store containing the dataset.
- type zarr_path:
str
- param tile_size:
Size of the tiles to split the images into. Images are divided into non-overlapping tiles of size (tile_size, tile_size). Default is 128.
- type tile_size:
int
- __init__(zarr_path, tile_size=128)
Initialize the dataset and split the images into non-overlapping tiles.
- Parameters:
zarr_path (str) – Path to the Zarr store containing the dataset.
tile_size (int) – Size of the tiles to split the images into. Images are divided into non-overlapping tiles of size (tile_size, tile_size). Default is 128.
- get_measurements(num_measurements=None, mean=2000, bias=0, filter_matrix=array([[0, 1], [1, 2]]), data_seed=None, noise_seed=None, noise='Poisson', tile_indices=None)
Get a set of measurements from the dataset by applying a Bayer-like filter.
The default filter matrix simulates the pattern used in Bayer filters for RGB+White channels. The images are rescaled to match a desired mean photon count and can be further corrupted by noise (Poisson or Gaussian).
- Parameters:
num_measurements (int) – Number of measurements to generate.
mean (float, optional) – Mean value to scale the images by, corresponding to the number of photons per pixel in the white channel. Default is 2000.
bias (float, optional) – Bias to add to the measurements. Default is 0.
filter_matrix (ndarray, optional) – Filter matrix to apply to the measurements. Default is [[0, 1], [1, 2]] (Bayer pattern).
data_seed (int, optional) – Random seed for selecting tiles from the dataset.
noise_seed (int, optional) – Random seed for noise generation.
noise (str, optional) – Type of noise to add to the measurements. Options are ‘Poisson’ (default), ‘Gaussian’, or None.
tile_indices (list, optional)
- Returns:
filtered_tiles – Array of measurements with shape (num_measurements, H, W, 4), where the channels correspond to R, G, B, W.
- Return type:
ndarray
- get_shape(tile_size=128)
Return the shape of the dataset based on the given tile size.
- Parameters:
tile_size (int, optional) – Size of the tiles in the dataset. Default is 128.
- Returns:
Shape of the dataset (number of tiles, tile height, tile width).
- Return type:
tuple
- class encoding_information.datasets.hml_dataset.HyperspectralMetalensDataset(h5_dir, center_crop=None)
Bases:
MeasurementDatasetBaseDataset of grayscale measurements captured with a metalens-based camera.
This dataset consists of images captured with a hyperspectral metalens camera, offering grayscale measurements. The data is loaded from .h5 files, and users can apply various preprocessing steps, including center cropping and noise addition.
Initialize the dataset by loading images from the specified directory.
- Parameters:
h5_dir (str) – Directory containing the .h5 files.
center_crop (int, optional) – Number of pixels to crop from each side of the images (default is None).
- __init__(h5_dir, center_crop=None)
Initialize the dataset by loading images from the specified directory.
- Parameters:
h5_dir (str) – Directory containing the .h5 files.
center_crop (int, optional) – Number of pixels to crop from each side of the images (default is None).
- _center_crop(data, crop_size)
Center crop the data by crop_size pixels from each side.
- Parameters:
data (np.ndarray) – Image data to crop.
crop_size (int) – Number of pixels to crop from each side.
- Returns:
Cropped image.
- Return type:
np.ndarray
- get_measurements(num_measurements, mean=None, bias=0, data_seed=21, noise_seed=123456, noise='Poisson')
Get a set of measurements from the dataset, with optional noise and bias.
This method retrieves random images from the dataset, applies optional mean scaling, bias, and noise (Poisson or Gaussian).
- Parameters:
num_measurements (int) – Number of measurements to return.
mean (float, optional) – Mean value to scale the measurements. If None, no scaling is applied (default is None).
bias (float, optional) – Bias to be added to the measurements (default is 0).
data_seed (int, optional) – Seed for random data selection (default is 21).
noise_seed (int, optional) – Seed for noise generation (default is 123456).
noise (str, optional) – Type of noise to apply. Can be ‘Poisson’, ‘Gaussian’, or None (default is ‘Poisson’).
- Returns:
Measurements with optional noise and bias.
- Return type:
np.ndarray
- get_shape(**kwargs)
Return the shape of the dataset.
- Parameters:
kwargs – Additional parameters.
- Returns:
Shape of the dataset.
- Return type:
tuple
- class encoding_information.datasets.mnist_dataset.MNISTDataset
Bases:
MeasurementDatasetBaseWrapper class for the MNIST dataset.
This class wraps the MNIST dataset, providing an interface for retrieving measurements from the dataset, with optional noise and bias applied.
Initialize the MNIST dataset by downloading it if necessary.
The dataset is loaded using TensorFlow’s keras.datasets.mnist API. The training and test data are concatenated to create a single dataset.
- __init__()
Initialize the MNIST dataset by downloading it if necessary.
The dataset is loaded using TensorFlow’s keras.datasets.mnist API. The training and test data are concatenated to create a single dataset.
- get_measurements(num_measurements, mean=None, bias=0, noise='Poisson', data_seed=None, noise_seed=None)
Retrieve a set of measurements from the MNIST dataset with optional noise and bias.
- Parameters:
num_measurements (int) – Number of measurements to return.
mean (float, optional) – Mean value to scale the measurements. If None, no scaling is applied (default is None).
bias (float, optional) – Bias to add to the measurements (default is 0).
noise (str, optional) – Type of noise to apply. Options are ‘Poisson’ or None (default is ‘Poisson’).
data_seed (int, optional) – Seed for random selection of images from the dataset (default is None).
noise_seed (int, optional) – Seed for noise generation (default is None).
- Returns:
Array of selected measurements with optional noise and bias applied.
- Return type:
np.ndarray
- Raises:
Exception – If the requested number of measurements exceeds the available dataset size, or if an unsupported noise type is provided.
- get_shape()
Return the shape of the MNIST dataset images.
- Returns:
Shape of the MNIST images (height, width).
- Return type:
tuple